18. Generators
How to Use Generators
The images captured in the car simulator are much larger than the images encountered in the Traffic Sign Classifier Project, a size of 160 x 320 x 3 compared to 32 x 32 x 3. Storing 10,000 traffic sign images would take about 30 MB but storing 10,000 simulator images would take over 1.5 GB. That's a lot of memory!
Not to mention that preprocessing data can change data types from an
int
to a
float
, which can increase the size of the data by a factor of 4.
Generators can be a great way to work with large amounts of data. Instead of storing the preprocessed data in memory all at once, using a generator you can pull pieces of the data and process them on the fly only when you need them, which is much more memory-efficient.
A generator is like a
coroutine
, a process that can run separately from another main routine, which makes it a useful Python function. Instead of using
return
, the generator uses
yield
, which still returns the desired output values but saves the current values of all the generator's variables. When the generator is called a second time it re-starts right after the
yield
statement, with all its variables set to the same values as before.
Below is a short quiz using a generator. This generator appends a new Fibonacci number to its list every time it is called. To pass, simply modify the generator's
yield
so it returns a list instead of
1
.
The result will be we can get the first 10 Fibonacci numbers simply by calling our generator 10 times. If we need to go do something else besides generate Fibonacci numbers for a while we can do that and then always just call the generator again whenever we need more Fibonacci numbers.
Start Quiz:
def fibonacci():
numbers_list = []
while 1:
if(len(numbers_list) < 2):
numbers_list.append(1)
else:
numbers_list.append(numbers_list[-1] + numbers_list[-2])
yield 1 # change this line so it yields its list instead of 1
our_generator = fibonacci()
my_output = []
for i in range(10):
my_output = (next(our_generator))
print(my_output)
Here is an example of how you could use a generator to load data and preprocess it on the fly, in batch size portions to feed into your Behavioral Cloning model .
import os
import csv
samples = []
with open('./driving_log.csv') as csvfile:
reader = csv.reader(csvfile)
for line in reader:
samples.append(line)
from sklearn.model_selection import train_test_split
train_samples, validation_samples = train_test_split(samples, test_size=0.2)
import cv2
import numpy as np
import sklearn
def generator(samples, batch_size=32):
num_samples = len(samples)
while 1: # Loop forever so the generator never terminates
shuffle(samples)
for offset in range(0, num_samples, batch_size):
batch_samples = samples[offset:offset+batch_size]
images = []
angles = []
for batch_sample in batch_samples:
name = './IMG/'+batch_sample[0].split('/')[-1]
center_image = cv2.imread(name)
center_angle = float(batch_sample[3])
images.append(center_image)
angles.append(center_angle)
# trim image to only see section with road
X_train = np.array(images)
y_train = np.array(angles)
yield sklearn.utils.shuffle(X_train, y_train)
# Set our batch size
batch_size=32
# compile and train the model using the generator function
train_generator = generator(train_samples, batch_size=batch_size)
validation_generator = generator(validation_samples, batch_size=batch_size)
ch, row, col = 3, 80, 320 # Trimmed image format
model = Sequential()
# Preprocess incoming data, centered around zero with small standard deviation
model.add(Lambda(lambda x: x/127.5 - 1.,
input_shape=(ch, row, col),
output_shape=(ch, row, col)))
model.add(... finish defining the rest of your model architecture here ...)
model.compile(loss='mse', optimizer='adam')
model.fit_generator(train_generator, /
steps_per_epoch=ceil(len(train_samples)/batch_size), /
validation_data=validation_generator, /
validation_steps=ceil(len(validation_samples)/batch_size), /
epochs=5, verbose=1)